AITopics | tan 1

Collaborating Authors

tan 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Rich and the Simple: On the Implicit Bias of Adam and SGD

Neural Information Processing SystemsJun-23-2026, 01:44:13 GMT

Adam is the de facto optimization algorithm for several deep learning applications, but an understanding of its implicit bias and how it differs from other algorithms, particularly standard first-order methods such as (stochastic) gradient descent (GD), remains limited. In practice, neural networks (NNs) trained with SGD are known to exhibit simplicity bias -- a tendency to find simple solutions. In contrast, we show that Adam is more resistant to such simplicity bias. First, we investigate the differences in the implicit biases of Adam and GD when training two-layer ReLUNNs on a binary classification task with Gaussian data. We find that GD exhibits a simplicity bias, resulting in a linear decision boundary with a suboptimal margin, whereas Adam leads to much richer and more diverse features, producing a nonlinear boundary that is closer to the Bayes' optimal predictor. This richer decision boundary also allows Adam to achieve higher test accuracy both in-distribution and under certain distribution shifts. We theoretically prove these results by analyzing the population gradients. Next, to corroborate our theoretical findings, we present extensive empirical results showing that this property of Adam leads to superior generalization across various datasets with spurious correlations where NNs trained with SGD are known to show simplicity bias and do not generalize well under certain distributional shifts.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Smooth Flipping Probability for Differentially Private Sign Random Projection Methods

Neural Information Processing SystemsFeb-9-2026, 08:55:34 GMT

Then, we propose a series of DP-SignRP algorithms that leverage the robustness of the "sign flipping probability" of random projections.

data mining, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(26 more...)

Genre: Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

5e34a2b4c23f4de585fb09a7f546f527-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 22:26:44 GMT

attraction, equilibria, steady state, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models

Nimmaturi, Datta, Bhargava, Vaishnavi, Ghosh, Rajat, George, Johnu, Dutta, Debojyoti

arXiv.org Artificial IntelligenceDec-2-2025

Fine-tuning large language models (LLMs) for complex reasoning with reinforcement learning (RL) continues to be prohibitively expensive. Through a phenomenological investigation of GRPO post-training dynamics, we identify a scaling law characterized by exponential reward saturation. The emergence of this early plateau motivates an important question: can GRPO be equipped with principled early stopping criteria to significantly reduce post-training compute while preserving downstream performance? Across four open-source models--Llama 3B/8B and Qwen 3B/7B--we perform a systematic empirical study of GRPO fine-tuning and derive scaling laws that accurately predict reward trajectories during training. Our analysis shows that GRPO reward curves are well-approximated by an exponential saturation with three phases that are consistent across all models: (i) slow initial progress, (ii) rapid improvement, and (iii) saturation. We further show that a simple parametric scaling law, conditioned on model size, initial performance, and normalized training progress, reliably predicts the onset of plateauing performance. A key practical finding is that training beyond roughly 80% of a single epoch yields negligible reward gains while consuming a substantial fraction of total computation. Using our scaling law, practitioners can forecast these phase transitions early and select data-driven stopping points, substantially reducing GRPO compute without sacrificing final performance. Our results suggest that such predictive scaling laws are a promising tool for managing GRPO finetuning costs.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.18014

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An operator splitting analysis of Wasserstein--Fisher--Rao gradient flows

Crucinio, Francesca Romana, Pathiraja, Sahani

arXiv.org Machine LearningNov-25-2025

Wasserstein-Fisher-Rao (WFR) gradient flows have been recently proposed as a powerful sampling tool that combines the advantages of pure Wasserstein (W) and pure Fisher-Rao (FR) gradient flows. Existing algorithmic developments implicitly make use of operator splitting techniques to numerically approximate the WFR partial differential equation, whereby the W flow is evaluated over a given step size and then the FR flow (or vice versa). This works investigates the impact of the order in which the W and FR operator are evaluated and aims to provide a quantitative analysis. Somewhat surprisingly, we show that with a judicious choice of step size and operator ordering, the split scheme can converge to the target distribution faster than the exact WFR flow (in terms of model time). We obtain variational formulae describing the evolution over one time step of both sequential splitting schemes and investigate in which settings the W-FR split should be preferred to the FR-W split. As a step towards this goal we show that the WFR gradient flow preserves log-concavity and obtain the first sharp decay bound for WFR.

artificial intelligence, machine learning, operator, (16 more...)

arXiv.org Machine Learning

2511.1806

Country:

Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A iDP-SignRP Under Individual Differential Privacy (iDP) 563 A.1 Relaxation: Individual Differential Privacy (iDP)

Neural Information Processing SystemsOct-8-2025, 08:20:01 GMT

Many extensions or relaxation of DP have been proposed to imp rove the utility of DP mechanisms.

mnist, projection, tan 1, (9 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Santa Clara (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Smooth Flipping Probability for Differentially Private Sign Random Projection Methods

Neural Information Processing SystemsOct-8-2025, 08:19:58 GMT

Then, we propose a series of DP-SignRP algorithms that leverage the robustness of the "sign flipping probability" of random projections.

probability, proceedings, projection, (14 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(26 more...)

Genre: Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Divergence Phase Index: A Riesz-Transform Framework for Multidimensional Phase Difference Analysis

Catanzariti, Magaly, Aimar, Hugo, Mateos, Diego M.

arXiv.org Machine LearningOct-7-2025

We introduce the Divergence Phase Index (DPI), a novel framework for quantifying phase differences in one and multidimensional signals, grounded in harmonic analysis via the Riesz transform. Based on classical Hilbert Transform phase measures, the DPI extends these principles to higher dimensions, offering a geometry-aware metric that is invariant to intensity scaling and sensitive to structural changes. We applied this method on both synthetic and real-world datasets, including intracranial EEG (iEEG) recordings during epileptic seizures, high-resolution microscopy images, and paintings. In the 1D case, the DPI robustly detects hypersynchronization associated with generalized epilepsy, while in 2D, it reveals subtle, imperceptible changes in images and artworks. Additionally, it can detect rotational variations in highly isotropic microscopy images. The DPI's robustness to amplitude variations and its adaptability across domains enable its use in diverse applications from nonlinear dynamics, complex systems analysis, to multidimensional signal processing.

divergence phase index, phase difference, riesz transform, (14 more...)

arXiv.org Machine Learning

2510.04426

Country:

South America > Argentina (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.87)
Health & Medicine > Therapeutic Area > Genetic Disease (0.87)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence (0.88)

Add feedback

Training Variation of Physically-Informed Deep Learning Models

Lenau, Ashley, Dimiduk, Dennis, Niezgoda, Stephen R.

arXiv.org Artificial IntelligenceOct-7-2025

A successful deep learning network is highly dependent not only on the training dataset, but the training algorithm used to condition the network for a given task. The loss function, dataset, and tuning of hyperparameters all play an essential role in training a network, yet there is not much discussion on the reliability or reproducibility of a training algorithm. With the rise in popularity of physics-informed loss functions, this raises the question of how reliable one's loss function is in conditioning a network to enforce a particular boundary condition. Reporting the model variation is needed to assess a loss function's ability to consistently train a network to obey a given boundary condition, and provides a fairer comparison among different methods. In this work, a Pix2Pix network predicting the stress fields of high elastic contrast composites is used as a case study. Several different loss functions enforcing stress equilibrium are implemented, with each displaying different levels of variation in convergence, accuracy, and enforcing stress equilibrium across many training sessions. Suggested practices in reporting model variation are also shared.

artificial intelligence, machine learning, training session, (18 more...)

arXiv.org Artificial Intelligence

2510.03416

Country: North America > United States > New Mexico (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry: